{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现使用iris数据集，建立变量Sepal.Length、Sepal.Width、Petal.Length对变量Petal.Width的Lasso回归模型，并手动编写Python对Lasso算法进行实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "iris=pd.read_csv(\"http://image.cador.cn/data/iris.csv\")\n",
    "x = iris.iloc[:,[0,1,2]]\n",
    "y = iris.iloc[:,3]\n",
    "x = x.apply(lambda x:(x - np.mean(x))/np.sqrt(np.sum((x - np.mean(x))**2))).values\n",
    "y = (y - np.mean(y)).values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先对自变量数据矩阵x进行中心标准化处理，对目标变量y进行去中心化处理。其次声明Lasso算法需要用到的变量，并进行初始化处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#活动变量下标集合\n",
    "m = x.shape[1]\n",
    "active = []\n",
    "max_steps = m + 1\n",
    "\n",
    "#初始化回归系数矩阵\n",
    "beta = np.zeros((max_steps, m))\n",
    "eps = 2.220446e-16\n",
    "\n",
    "# 非活动变量与残差的相关系数\n",
    "C = []\n",
    "Sign = []\n",
    "\n",
    "#非活动变量下标集合\n",
    "im = range(m)\n",
    "inactive = range(m)\n",
    "k = 0\n",
    "\n",
    "#计算y与x的相关性\n",
    "Cvec = np.matmul(y.T,x)\n",
    "\n",
    "#被忽略的变量下标集合\n",
    "ignores = []"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当达到最大迭代次数或活动变量集A中的元素个数达到最大自变量个数的时候退出循环\n",
    "\n",
    "当发现候选变量中的最佳相关系数接近于0时，也可退出循环"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "while k < max_steps and len(active) < m:\n",
    "    C = Cvec[inactive]\n",
    "    Cmax = np.max(np.abs(C))\n",
    "    if Cmax < eps*100:\n",
    "        print(\"最大的相关系数为0，退出循环\\n\")\n",
    "        break\n",
    "    new = np.abs(C) >= Cmax - eps\n",
    "    C = C[np.logical_not(new)]\n",
    "    new = np.array(inactive)[new]\n",
    "    for inew in new:\n",
    "        if np.linalg.matrix_rank(x[:,active+[inew]]) == len(active):\n",
    "            ignores.append(inew)\n",
    "        else:\n",
    "            active.append(inew)\n",
    "            Sign.append(np.sign(Cvec[inew]))\n",
    "    \n",
    "    active_len = len(active)\n",
    "    exclude = active + ignores\n",
    "    inactive=[]\n",
    "    t0 = [inactive.append(v) if i not in exclude else None for i,v in enumerate(im)]\n",
    "    xa = x[:,active]*Sign\n",
    "    oneA = [1]*active_len\n",
    "    A = np.matmul(np.matmul(oneA,np.linalg.inv(np.matmul(xa.T,xa))),oneA)**(-0.5)\n",
    "    w = np.matmul(A*np.linalg.inv(np.matmul(xa.T,xa)),oneA)\n",
    "    if active_len >= m:\n",
    "        gamhat = Cmax/A\n",
    "    else:\n",
    "        a = np.matmul(np.matmul(x[:,inactive].T,xa),w)\n",
    "        gam = np.array([(Cmax - C)/(A - a), (Cmax + C)/(A + a)])\n",
    "        gamhat = np.min([np.min(gam[gam > eps]),Cmax/A])\n",
    "\n",
    "    b1 = beta[k, active]\n",
    "    z1 = np.array(-b1/(w*Sign))\n",
    "    zmin = np.min(z1[z1 > eps].tolist()+[gamhat])\n",
    "    gamhat = zmin if zmin < gamhat else gamhat\n",
    "    beta[k + 1, active] = beta[k, active] + gamhat*w*Sign\n",
    "    Cvec = Cvec - gamhat*np.matmul(np.matmul(xa,w).T,x)\n",
    "    k=k+1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "得到回归系数的估计矩阵beta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.        ,  0.        ,  0.        ],\n",
       "       [ 0.        ,  0.        ,  8.65652655],\n",
       "       [ 0.        ,  0.27627203,  8.93279858],\n",
       "       [-2.09501133,  1.18554279, 11.29305357]])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "beta"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
